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This paper focuses on the probability that a portion of DNA closes on itself through thermal 
fluctuations. We investigate the dependence of this probability upon the size r of a protein bridge 
and /or the presence of a kink at half DNA length. The DNA is modeled by the Worm-Like Chain 
model, and the probability of loop formation is calculated in two ways: exact numerical evaluation 
of the constrained path integral and the extension of the Shimada and Yamakawa saddle point 
approximation. For example, we find that the looping free energy of a 100 base pairs DNA decreases 
from 24 ksT to 13 ksT when the loop is closed by a protein of r = 10 nm length. It further decreases 
to 5 ksT when the loop has a kink of 120° at half-length. 

PACS numbers: 36.20.Hb, 46.70.Hg, 82.35.Pq, 87.14.Gg, 87.15.Aa, 87.15.La 

I. INTRODUCTION: DNA LOOPS IN GENE TRANSCRIPTION REGULATION 

Gene expression is regulated by a wide variety of mechanisms. These activation as well as repression phenomena may 
occur at every expression steps (translation, transcription, etc) and do involve interactions between several biological 
molecules (DNA, RNAs, proteins, etc). For instance proteins bound on specific DNA sequences may turn on/ofF 
genes transcription by interacting with each other. By bringing closer those proteins DNA looping can ease their 
interactions US S [l^ . Such looping events may be observed over a wide range of lengths spreading from hundreds 
to thousands base pairs (bp) . The looping probability has been firstly measured by the cyclization of DNA segments 
in solution with cohesive ends. Once the loop is formed proteins called hgases stabilize it 0,01 ■ It is then possible 
to count the circular DNAs with respect to the linear ones. The loop formation mediated by proteins have also been 
experimentally studied. Two examples are the loops formed by the LacR or the GalR transcriptional repressors 0. 
Two units of such proteins bind at two specific positions along a same DNA and associate to form a complex when 
the binding sites come in contact. The formation of such loops has been recently studied using micromanipulation 
experiments on a single DNA molecule 4, 5]. The study of the GalR mediated loop has shed light on the role of a 
third protein called HU that sharply bends {i.e. kinks) the DNA at half-length. 

The DNA loop probability depends mainly on its length and flexibility. Long DNAs (typically longer than 1500 bp) 
essentially behave as Gaussian Polymers (GP) the cyclization cost is mainly of an entropic nature t^e 

contrary, for small lengths DNA cyclization is difficult mainly because of the bending energy cost. The computation 
of the elastic energy for the Worm-Like Chain (WLC) model iSfclSljiSi can be analytically performed numerical 
methods have also been employed when electrostatic properties are included 13j. At intermediate length scales (from 
about 500 bp) elastic rigidity and entropic loss are both important. Several approximations have been developed to 
study this lengths range 

ilia in la, 

among them the calculations of the fiuctuations around the lowest bending 
energy configurations performed by Shimada and Yamakawa llSll . Numerical approaches have also been developed: 
Monte Carlo |jjj and brownian dynamics based simulations 'l7l llSl ITgj as well as numerical calculations of the WLC 
path integral under the closed ends constraint ,21j- This last method allowed Yan, Kawamura and Marko to 
study the elastic response of DNA subject to permanent or thermally excited bendings caused by binding proteins (such 
as HU) or inhomogeneities along the DNA double helix |22| . All these studies do provide a better understanding 
of the underlying regulation phenomena despite their overall complexity. 

In this paper we study two processes that turned out to be important in DNA looping. Namely the size of the 
protein complex clamping the loop p^llSf. acting as a bridge between the two DNA ends, and mechanisms impl ying 
DNA stiffness loss which are taken into account in an effective way by kinking the WLC at half-contour length 
IP', T?, "2^ . In section El we define the model and the methods: we describe the numerical approach (§ III All and the 
analytical Saddle Point Approximation (SPA, ij lIIB|) to calculate the r-dependent closure factor and the looping free 
energy. In section ITlll we compare the numerical and SPA results with previous experimental and theoretical results. 
In section Hvl we extend the numerical and SPA approaches to a kinked loop; we discuss our results and we propose 
a simple formula that accounts for both the protein bridge and kink effects (St IIV C|) . We conclude (section by 
sketching how to include omitted DNA properties which may also play an important role in its closure such as twist 
rigidity or electrostactic interactions. 
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II. DEFINITIONS AND METHODS 



We use the well known Worm-Like Chain (WLC) model [3jl3|9|- The DNA polymer is described as an inextensible 
continuous difFerentiable curve of contour length L, with unit tangent vector t{s) (0 < s < L). The polymer 
is characterized by the persistence length A beyond which tangent vectors lose their alignment: (t(s) • t{s')) = 
exp(— |s — s'\/A). The energy of a configuration of the polymer stretched under an external force f Cz reads 



E[t;Lj] 



1 A 
2/3 70 



dt{s) 
ds 



'^s - f I 6*2 • t{s) ds 





(1) 



where we use (3 — \/k^T. No twist elasticity nor extensibility will be considered. The partition function is 

Z{LJ)^ j&te^Y>{~PE[t-Lj]]. (2) 

Notice that summation over all initial and final tangent vectors orientations, i(0) and is implicitly understood 

in this path integral. 

In this paper we are interested in the formation of a loop in a DNA molecule, and the Probability Density Func- 
tions (PDFs) of end-to-end distances play an important role. The quantities under study are denoted by Q, S*, P 
and J respectively and defined as follows: 



the end-to-end extension r = {x,y,z) PDF at zero force. 



S>t 5 



t{s) ds — r 



exp{-/3i?[r;L,/ = 0]}. 



(3) 



In the absence of force Q depends on its argument r through its modulus r = |r| only, and we may introduce 
the radial PDF 



S{r,L) = Airr^ Qir,L). 



(4) 



• The z extension PDF reads 



/L p L 
dx dy Q [{x,y,z),L] 
-L J-L 



(5) 



In the absence of external force, notice the x or y extensions PDFs are given by P too. Interestingly, the radial 
and z extensions PDF are related to each other through the useful identity 



S{r,L) = -2r^{r,L). 



(6) 



• The cyclization factor 

JiL)=Q{Q,L) (7) 
defined as the density of probability that the two ends of the DNA are in contact with one another. 

• The r-dependent closure factor is 

- '4 



S{r',L) dr' 



' ^ 3 



(8) 



It gives the density probability for the two ends of the chain to stay within a sphere of radius r. It is easy 
to check that J(r, L) J{L) when r ^ 0. For experimental convenience, units used for J{L) and J(r, L) are 
moles per liter: 1 nm""^ k, 1.66 mol ■ L^^ = 1.66 M. In these units J{r,L) gives directly the concentration of 
one binding site in proximity of the other. 
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• Finally we consider the looping free energy cost 

/3AG(r,L) 



In 



J(r, L) X -irr^ 



(9) 



Note that this definition does not include the details of the geometry nor the affinities of the DNA/protein and 
protein/protein interactions. We actually assume all the sphere of radius r to be the reacting volume, i.e. that 
the loop will form if the DNA ends happen to be in this sphere. 

Despite intense studies of the WLC model no exact analytical expression is known for Q and the quantities of interest 
here, namely J and S. However approximations expanding from the two limiting regimes (entropic [lol and 
elastic |l4l Il5j |') along with exact numerical computations are available. Hereafter, we have resorted to numerical as 
well as approximate analytical techniques (SPA) for calculating the cyclization factor J and the probability of almost 
closed DNA configurations. 



A. Numerical Calculation of the Probabilities P, S and J 



Our starting point for the calculation of the z extension PDF is the Fourier representation of the DiRAC (5-function 
in and ®, 

P{z,L)^J^ —e+^^^-ZiLJ^-tk/P). (10) 

At fixed momentum k we are left with the calculation of the partition function Z{L, /) at (imaginary) force / ~ —ik/ (3. 
The path integral ^ defining Z is interpreted as the evolution operator of a quantum system, the rigid rotator under 
an external imaginary field 

Z{L, f) = (finall exp -L/A x H{f) |initial). (11) 

The entries of its hamiltonian H are easily expressed in the spherical harmonics \£,m) basis: {£,m\H{f)\£' ,m') = 
Hi.r if) 5„i,m' with 

' ^■'^ 2 ^ ■< ^(2£+i)(2f + 1) ^ ^ 

The entries of H do not depend on the azimuthal number m due to cylindrical symmetry around the force axis. 
Finally, integration over all initial and final orientations for the tangent vectors at the ends of the polymer chain 
selects jinitial) = Jfinal) = |0,0}. 

A recent paper |2J| used Mathematica to compute the vacuum amplitude lll|l through a direct matrix exponentiation. 
We have instead used the Expokit library '231 since it proves to be more accurate and faster for intensive numerical 
calculations. We truncated hamiltonian H12I) in a way the outcome is insensitive to the cut-off on the harmonics order. 
The (inverse) Fourier transform ifTUI) is then handled by a Fast Fourier Transform (FFT) algorithm "2^ . This task 
is in particular facilitated thanks to the inextensibility constraint which makes the distribution bandwidth limited. Our 
results for the z extension PDF are shown in Fig. ^ The cross-over from the rigid elastic regime {L/A = 0.1, 0.5, 1) 
to the flexible entropic regime {L/A = 5,10,15) is clearly visible. Using © then gives us access to S, the radial 
extension PDF (see Fig.[5J). The most probable value for the distance r^, switches from full extension r* ^ L for contour 
lengths L ^ A (elastic dominated regime) to the GP most probable extension ^y4~LA/3 for longer ones L A (entropic 
dominated regime) . Also note that S always goes continuously to zero in the r L limit due to WLC inextensibility 
We have finally calculated the r-dependent closure factor J according to ||HJ) by numerical integration of S, and the 
looping free energy cost AG defined in @. 

Let us now discuss the numerical errors that could be important when calculating probabilities of rare events. Main 
sources of error are the hamiltonian l|12|l truncation and the integration step to compute the r dependent closure 
factor © . As mentionned above the cut-off on the harmonic order was systematically choosed in order convergence is 
observed. We have used £ = 50 after having verified that the result is unchanged for £ = 100. Concerning integration, 
limitation comes from the number of available data in the range < r' < r which is directly related to the k sampling 
of the partition function Z . For r — I nm the numerical integration is still reliable, but decreasing further r turns 
out to be critical. Other potential sources of error are negligeable. Indeed bandwidth limitation of P{z] L) prevents 
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Relative z/L extension 

FIG. 1: Numerical computation of the z extension PDF over a wide range of contour lengths L. As expected the agreement 
with the SPA prediction (see section Hi Bl improves as L decreases (upper bound Lj A < 1). The long WLC behavior is caught 
by the GP approximation as soon as, say LjA > 5. 



T ■ 1 ' 1 ' r 




Relative rIL extension 

FIG. 2: Numerical computation of S, the radial r extension PDF. The outcomes of the numerical calculations are tested 
against exactly known values for the first even moments (r^") (inset) @, 0- The shape of S compares very well to the widely 
used WiLHELM and Frey (W&F) expansion [T3 |. valid up to LjA < 2. Similar tests were achieved with other popular 
approximation schemes (data not shown). 

any aliasing [2^ during FFT IjlUI) and the derivative of P © can actually be skipped by an integration by part of © 
to compute J. Further hypothetical errors would then come from the Expokit library [25| itself but its routines were 
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coded to compute accurately matrix exponentials over a broad range of matrices |23| • We have checked the numerical 
precision of our method by the comparison with the exact values for the first even moments of S{r) (Fig. |21 Inset); 
moreover as shown in FigI21S'(r) agrees with Wilhelm and Frey expansion for small L/A values; finally we will see in 
section Unl that the numerically calculated cyclization factor J{L) © is in agreement with the Shimada-Yamakawa 
and Gaussian approximation results for respectively small and large L (Fig. 



B. Saddle Point Approximation for J 



In addition to the exact numerical calculations detailed above, we have carried out approximate calculations based 
on a saddle point estimate of the partition function We follow the Shimada and Yamakawa calculation for 
the saddle point configuration extending it for an opened DNA. The saddle point configuration for a closed 
loop is shown in Fig. inset: it is a planar loop. The tangent vector at position s along the chain is characterized 
by its angle 9{s) with respect to the end-to-end extension r. The optimal configuration is symmetric with respect 
to the perpendicular d?L-axis while the half-length angle is 9{L/2) = 180°. The initial angle 9{0) is chosen by the 
minimization of the bending energy of the chain. The optimization gives rise to the following condition on the 
parameter x — cos[6'(0)/4] 

(l + r/L) K{x') =2E{x^) , (13) 

where K(x'^) = K{^,x'^) and E{x'^) ^ E , a;^) are the complete elliptical integrals of the first and second kinds 
respectively |23|. The corresponding elastic energy is 

(3AE{r, L) = (x^) x {2x^ - 1 + r/L) . (14) 

The end-to-end extension PDF is then approximated as 

Q{r,L)^C{r,L) cxp[-l3AE{r, L)]. (15) 

The prefactor C(r, L) should be calculated by taking into account quadratic fluctuations to the saddle point config- 
uration. Since such calculation is quite involved we will actually only extend the Shimada and Yamakawa results 
which was computed considering fluctuations to a closed loop. In M = mol • L^^ units this reads 

Csy{L) = ^ X exp(0.246 x L/A). (16) 

For this factor to fit the correct fluctuations (to the opened loop) we have to consider the fluctuations to a fake closed 
loop of similar bending energy. Such a loop may be obtained by considering the optimal configuration of a loop of 
contour length {L + 2r) instead of L, as shown in Fig. ^ (bottom, inset). The choice of the factor 2r derives from the 
following geometrical considerations: the closed saddle point configuration has an initial angle 9{Q) — 49.2°, the AL 
closing the loop could be calculated for each value of r by requiring 

AL 

cos 9{s) ds — r. (17) 



The angle 9{s) increases slightly on the first part of the trajectory 9{s) > 9(0) and AL > 1.53 r. We have chosen AL = 
2r as an approximate value, this approximation has the advantage that it can be directly put in the fluctuation 
expression C{r, L) w Csy{L + 2r) in equation H15|l to obtain: 

0(r, L) = Csy{L + 2r) exp[-/3Ai;(r, L)] (18) 

where AE is given in formula H14|l and Csy is given in formula H16|l . The validity of this approximation for the 
fluctuations prefactor was checked out by comparing J{r,L) obtained from Q{r,L) through formula ((HJ, with the 
numerical results. The good agreement shown in Fig. 0] allows to obtain a semi-analytical formula for the loop 
probability with a finite interacting volume, which is valid for molecules of up to 2 kb (kilo base pairs) . 
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FIG. 3: Cyclization factor as a function of the DNA length with: the gaussian model (gray line); the WLC model with 
the Shimada and Yamakawa formula (dotted black line) ; the WLC model with the numerical calculation (full black line) . The 
most probable length is 500 bp. Inset: the lowest bending energy configuration of a closed loop. 



III. RESULTS: THE EFFECT OF A PROTEIN BRIDGE 



In Fig. O we show the cyclization factor J{L) for lengths L up to 3 kb. The most probable loop length is of 
about = 500 bp that is L^/A « 3.5 [Uli^. Both shorter and longer cyclized lengths are less probable: stiffness 
makes difficult the bending of smaller polymers while entropy makes longer polymers ends unlikely to meet. The 
numerical cyclization factor is compared with the saddle point calculation of Shimada and Yamakawa [l^ and with 
the gaussian polymer (GP) model. The former is in good agreement with the numerical results for lengths smaller 
than about 1.5 kb while the latter works well for loops larger than 2.5 kb. 

The effects of the finite size r of the protein bridge are displayed in Fig. 0| which shows J(r, L) (on the left) 
and AG(r, L) (on the right) for lengths L ranging from 75 bp {i.e. 25 nm) to 300 bp [i.e. 100 nm) and r respectively 
ranging from 1 nm to 10 nm. The numerical results (on the top ) are in good agreement with the SPA results (on 
the bottom). Note that for small lengths L, J{r,L) has a peak for the L k, r event corresponding to rigid rod-like 
configurations. Fig. 2|does not show this peak, occurring for r = 10 nm at about L = 30 bp (or L = 10 nm) because 
we focus on the cyclisation events. As shown in this figure r values ranging from 1 nm to 10 nm make no difference 
for the r-dependent closure factor, considering contour lengths L larger than 300 bp (or 100 nm). The cyclization 
factor J{L) is evaluated as the limit r — > of J{r,L). In practice convergence is reached as soon as r is about one 
order of magnitude smaller than L. In the range i > 75 bp (25 nm) J(5 nm, 150 bp) converges to J(l nm, 150 bp). 
On the other hand J{r,L) is considerably different from J{L) when r is of the same order of magnitude than L. For 
example for loops of i = 100 bp (34 nm) an end-to-end extension of 10 nm increases by two orders of magnitude 
the closure factor. Therefore proteins of size « 10 nm are expected to produce drastic enhancements in looping short 
DNA sequences. 

In terms of energetics (see Fig.01 right) cycling a 100 bp DNA sequence costs 25 fcer when the loop ends are required 
to stay within a sphere of radius r = 1 nm. This cost decreases to 13 ksT if the sphere has the typical protein 
size of r = 10 nm. For loop lengths larger than 300 bp the only difference in the three curves of Fig.0| (right), is a 
free energy shift due to the difference in the reacting volume. For instance, a reaction radius of 10 nm decreases the 
looping free energy of 3 x In(lO) « 7 fceT with respect to a 1 nm reaction radius. 

Our results for the closure factor and the looping free energy are compared in Fig. 0] with the Monte Carlo (MC) 
and Brownian Dynamics (BD) simulations results, obtained respectively by Podtelezhnikov and Vologod- 
SKii ^3 (shown in the figure with filled circles, •) and Langowski and al. [TtI |18] (displayed in the figure with 
empty squares, □). Numerical and SPA data (that for r = 1 nm converge to the Shimada and Yamakawa curve) 
are in better agreement with the MC data than with the BD ones; indeed numerical, SPA and MC data are obtained 
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FIG. 4: Closure factor (left panel) and free energy (right panel) with a protein bridge of sizes; r = \ nm (dashed line); 
5 nm (dotted line); 10 nm (full line). The error bars are shown when they are larger than the symbol sizes. Top: numerical 
calculation of the constrained path integral. The r = \ nm curve coincides with the r = cyclization factor. Bottom: closure 
factor obtained by the extension of the Shimada and Yamakawa calculation, that includes r. Theoretical results are in very 
good agreement with Monte Carlo simulations obtained by Podtelezhnikov and Vologodskii (filled circle, •) ^ and in 
fair agreement with brownian dynamics simulations obtained by Langowski and al. (empty square, 

a) P3,lii- Inset of the 

bottom left panel: lowest bending energy configurations for 100 bp and r = (bottom) or r = 10 nm (top), the closure of 
the r = 10 nm configuration is shown by a thin line. 



with a simpler model than BD ones, which does not include twist rigidity and electrostatic effects. 
Considerations about Lac operon repression energetics will help us illustrating our results and compare them with other 
previous results. Expression of proteins enabling bacteria E. Coli to perform the lactose metabolism can be prevented 
at the transcriptional level by cycling two different sequences 0, the smallest one including the operon promoter. 
Let O1O3 = 76 bp and O1O2 — 385 bp denote these two resulting DNA loops where the so-called operators Oi.2,3 ac- 
tually are the small specific DNA sequences (10 bp or so) at which the tetrameric repressor protein LacR can bind thus 
clamping the desired loop. Notice both processes are needed for efficient repression: despite O1O3 contains the operon 
promoter its cyclization is much less probable to occur than the O1O2 one (see cyclization factor J{L) in Fig.O. The 
LacR size is estimated from its cristallized stucture to r « 13 nm In the in vitro experiments many parameters are 
under control among which the operators sequence and location. The distance between the two operators, that defines 
the length of the DNA loop has been fixed in llj to about 100 bp. Our results are in good agreement with the 
experimental measured stability of a 114 bp DNA loop mediated by a LacR protein, obtained by Brenowitz and al. 
in 1991 13 . By measuring the proportion of looped complexes present in a solution with respect to the unlooped 
molecules they obtained a looping free enegy of 20.3 ± 0.3 fceT to wich they associated a closure factor of 8 10~^° M. 
From Fig. 01 the closure factor of a loop of 114 bp with a protein bridge of r = 10 nm is J(10 nm, 114 bp) = 10^^ M 
to which we associate, from formula Q a cyclization free enegy AG(10 nm, 114 bp) = 12 fceT. Note that the very 
good agreement between the closure factor contrasts with the bad agreement for the cyclization free energy. The 
latter could have been calculated considering a different reaction volume or it could also include the competition with 
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configurations that do not allow the formation of a loop (see Fig. 2 of Q)- To explain the high value found for the 
closure factor Brenowitz and al. already included the size of the protein in the analysis of their results by comparing 
their J result with the value expected for the cyclization probability of a free DNA when the length of the protein is 
included in the size of the loop. 

Another result on DNA loop mediated by LacR protein has been obtained by Balaeff and al. jl^J who have nu- 
merically calculated the elastic energy of the O1O3 loop from a WLC model also including: the twist rigidity, the 
short range electrostatic repulsion and the details of the LacR/DNA complex crystal structure. The elastic energy 
is estimated to 23 fceT in 13], of which 81% (that is 18 fceT) due to the bending and 19% due to the unwinding. 
The bending energy of 18 ksT is to be compared to the saddle point energy ((njl of 15 ksT for a 75 bp loop with 
an end-to-end extension of r = 10 nm. The corresponding free energy of loop formation obtained by (|18|l and after 
integration over the reacting volume lO is AG'(10 nm, 75 bp) = 14 ksT (see Fig.0I bottom/right). For a 400 bp loop 
since LacR is only « 10% of the loop length, its size is expected to play a less important role. Indeed the cycliza- 
tion probability does not depend on r < 10 nm and the free energy of forming such a loop decreases from 15 k^T 
for r = 1 nm to 8 /cbT for r = 10 nm only because the reaction volume increases by a factor 3 x In (10) w 7 fceT. 



IV. EFFECT OF THE PRESENCE OF A KINK 



The previous protein mediated DNA looping modelization (section assumes that the only intervening proteins 
arc the ones clamping the loop ends {e.g., the Lac operon repressor LacR). Actually regulation phenomena involve 
several proteins which may bind along the whole DNA. Indeed naked DNA situations barely exist in vivo. For instance 
single molecule manipulations 5] have shown that efficient Gal operon repression needs a stiffness loss of the 113 bp 
DNA portion to be looped. The HU protein produces such loss by kinking the sequence. 



A. Numerical Calculation of J for a Bridged and Kinked Loop 

Such stiffness loss may be taken into account in an effective way by kinking the WLC at half-length L/2. Let us 
call 6*- and 9+ the angles of the DNA just before and after the kink respectively. We assume that the kink plane is 
vertical and choose it to define the origin of the azimuthal angle (f): (p- = 0+ = 0. Using the quantum language of 
section III Al we replace the calculation of the evolution operator Z(L, /) with its kinked counterpart 



^kinkcd(i,/) = (ffiial|cxp 

= ^(0,Olexp 



\e+,(t>+) X (6i_,0_|exp 
^,0) (^',0|exp 



L 
2A 



H{f) 



I initial) 



where we have used the change of basis from angles to spherical harmonics 

|^±>± = o)-^r,°(0±,Q)|£,o). 



|o,o) xr/(0+,o)r,°(0_,o) (i9) 



(20) 



l>0 



Although calculations are a little bit more involved the evaluation scheme for the cyclization factor J remains un- 
changed in its principle fsection III A|l . We now have "|£ 7^ 0,0) elements" corresponding to particular orientations 
arriving at (— ) and leaving from (+) s — L/2. The kink angle k is intuitively defined from these WLC tangent vectors 
at half-length (in spherical coordinates) through 



(21) 



B. Saddle Point Approximation of J for a Bridged and Kinked Loop 

The saddle-point calculation of section lTl BI can be straightforwardly extended to the case of a kinked loop. In Fig.jS] 
we show the configurations with the lowest bending energy for a 100 bp DNA loop with an cnd-to-cnd extension r — 
10 nm. The kink is accounted for by a bending angle in the middle of the chain k = 2 x 0[L/2) — t: a priori different 



FIG. 5: Lowest bending energy configurations for a 100 bp DNA loop with an end-to-end extension r = 10 nm and kinks k — 90°, 
120°, 150°, 180° respectively. The gray configurations are the closed loops used in the calculation of the fluctuation. 



from the previous trivial value k = n, ranging from 150° to 90°. We introduce the phase = arcsin[sin(^!^)/a:] . 
The parameter x is now obtained from equation 



[l + r/L] K{x'^) - K{'tP,x'^) 
and the total bending energy is 

(3AE{r, L, k) 



= 2 



E{x^) -E{^P,x^) 



L/A 



K{x^) - K{'iIj,x^) {2x^ -l + r/L) 



In analogy with (|15|1 we obtain the end-to-end extension r PDF 

Q(r, L, k) = Csy{L + 2r) exp[-(3AE{r, L, k)], 



(22) 



(23) 



(24) 



where Csy is given in (I15|l . from wich we calculate J(r, L, k) through formula ©. Note that (|24|) reduces to the loop 
probability given in [23] for a closed and kinked DNA. Again the good agreement obtained with numerical results 
allows us to establish a semi-analytical formula for the loop probability with a finite interacting volume and kink. 



C. Results for a Kinked and Bridged Loop 



In Fig. results for the looping probability density (left) and free energy (right) are shown for a typical end-to-end 
extension r = 10 nm and kinks k — 150°, 120°, 90°. The curve with no kink {k = 180°) is also shown for comparison. 
The numerical results on the top of the figure are in very good agreement with the extension of the Shimada 
and Yamakawa saddle point approach on the bottom of the figure (black lines). In Fig. (inset) we show the results 
obtained by BD simulations by Langowski and al. for J(10 nm, L, k) (empty squares, □) fitted by Rippe 

with a simple formula containing one fitting parameter for each curve. The curves in the inset of Fig. |^ reproduce 
the same behavior with L and k of our numerical (top of Fig. O or SPA curves (bottom of Fig. 01 . The numerical 
gap with BD results increases for large kinks: at k = 90° our closure factor is ten times larger than the closure factor 
obtained by BD simulations. The electrostatic and twist rigidity effect could indeed play a more important role when 
the chain is kinked. Note that the lengths range of the numerical results are from 100 bp to 500 bp, while the lengths 
range of the saddle point results is from 75 bp to 1500 bp. The lengths range of the SPA is limited by the validity 
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FIG. 6: Closure factor (left) and free energy for a loop with r — 10 nm and a kink k in the middle of the chain: k — 180° (full 
lines); k = 150° (dotted lines); k — 120° (dashed line); k — 90° (long dashed lines). Top: numerical calculation of the 
constrained path integral. Bottom: extension of the Shimada and Yamakawa calculation (black lines) and approximate 
formula (gray line) given in the text 12511 . Numerical results are in very good agreement with the SPA approximation and the 
approximate formula 12511 . Inset: brownian dynamics simulations point obtained by the Langowski and collaborators (empty 
square, □) JJ,, , fitted by a simple formula by Rippe [19|. 



of the approximation « 1500 bp shown in Fig. O As an example the closure factor J(10 nm, 113 bp, k) of a 113 bp 
fragment, obtained by the numerical calculations, increases from the value of 10^^ M for a non kinked loop (k — 180°) 
to 4 10"* M, 2 10"'^ M and 4 10"^ M for respectively k = 150°, 120° and 90°. The corresponding looping free energy 
is AG(10 nm, 113 bp, k) = 13 kuT, 9 fceT, 5 ksT and 2.5 fceT for k = 180°, 150°, 120° and 90° respectively. With the 
saddle point approach we find similar results: J(10 nm, 113 bp, k) = 2 10~^ M, lO"'^ M, 5 10"'^ M and 8 10"^ M while 
AG'(10 nm, 113 bp, k) = 11 fceT, 8 fceT, 4.4 k^T and 1.7 ksT. As it is shown in Fig.Elthe presence of the kink become 
irrelevant for DNA segment larger than about 1500 bp. The example of a 113 bp DNA segment has been chosen to 
compare the results with the single molecule experiments on the GalR mediated loop of an « 113 bp DNA portion 
between the two operators. From the kinetics of the loop formation mediated by the GalR and HU proteins, LiA 
and al. have deduced a looping free energy of 12 fceT", that with respect to our values should correspond to a kink 
angle of more than 150° or to an end-to-end extension smaller than r = 10 nm. Another significant change in the 
cyclization probability is that the stiffness loss induced by k reduces the most probable loop length from 500 bp for a 
non kinked DNA to 340 bp and 190 bp (from the numerical calculation) or to 300 bp and 150 bp (from SPA) for kinks 
of respectively 150° and 120°. Note that for a kinked loop with an end-to-end extension r the minimal length Lq 
corresponding to the rigid rod-like configuration fulfills the relation Losin(K/2) = r and therefore it is of w 42 bp 
for K = 90° instead of « 30 bp for k — 180°. for k — 90° the most probable loop length is the rigid kinked rod-like 
configuration of the two half-DNA portions. To catch both kink and protein bridge effects in a simple formula, we 
have calculated the cyclization factor with the extension of Shimada and Yamakawa formula for a kinked closed 
loop of length (L -I- 2r). This approach is similar to what was suggested en 1991 by Brenowitz and al. 'S] to 
interpret their experimental data, i.e. to directly consider the protein as part of the length of the loop. A linear 
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FIG. 7: The "x" points: saddle point energy in units of L/A (that is £ = /S.E x L/A) for the saddle point configurations (also 
displayed in the figure) with kink angles of k = 90°, 120°, 150°, 180°. Dotted line: linear interpolation used in formula 1251 . 



fit llg of the bending energy H23|l for the optimal closed configuration (r — 0) in the presence of a kink (expressed 
in degrees): (3AE{r = 0,L,k) w (—7.1 + 0.1155 /c) /(L/A), is shown in Fig. [7| It gives the following approximated 
formula for the closure factor as a function of the protein size r, the length L of the DNA, and the kink angle k 

Lapprox(i, r, k) = Csy{L + 2r) exp 

where Csy is given in (|16() . Notice the integration step ^ has been skipped since it does not make any significant 
difference. Formula H25|) allows us to obtain a simple prediction for the loop probability in presence of a kink in 
the middle of the sequence and a finite separation between the extremities. As shown in Fig. |S1 this formula (gray 
lines) is in good agreement with the loop probability obtained with the exact calculation of the saddle point energy 
of the open configuration (full line). In particular Fig. shows that for kink angles in the range 90° < k < 150° this 
simple formula works remarkably well for lengths L larger than about 5r, that is 150 bp for r = 10 nm. For smaller 
lengths the optimal configuration is more a rigid rod-like and cannot be approximated by a closed loop. Similar 
simple formulas that includes a kink angle k and a finite end-to-end distance r in an effective way have also been 
written down by RiPPE or RiNGROSE to fit their brownian dynamics simulation or experimental data |2^, but 
these formulas contain a parameter that must be fitted for each values of r and k from the data points (Fig.l^. 



7.1 - 0.1155k 

(L + 2r)/A 



(25) 



V. CONCLUSION 



We performed both numerical and analytical calculations of the closure factor J, even in the presence of a pro- 
tein bridge and of a protein-mediated kink. More precisely we have numerically calculated the path integral of 
the WLC polymer model under the constraints of a fixed end-to-end distance r and a kink k in the middle of the 
DNA portion. Analytically we have extended the Shimada and Yamakawa saddle point approximation 0| to the 
case of a bridged and kinked loop. We have seen that the formation of DNA loops is significantly sensitive to the 
size of the protein bridge when this size r is more than 10% of the loop length L, that is 300 bp (or 100 nm) for 
a typical protein bridge size of r = 10 nm. To give an example, the closure factor for a 100 bp DNA segment in- 
creases from J(100 bp, 0) ~ 10~^^ M to J(100 bp, 10 nm) « 10~^ M. Correspondingly, looping free energy decreases 
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from AG(100 bp, 0) = 24 k^T to AG(100 bp, 10 nm) = 13 k^T. A kink ranging from 150° to 90° produces a 
significant increase of J for DNA fragments of lengths up to about 2500 bp. For instance the closure factor for a 
100 bp DNA segment with a protein bridge r = 10 nm and a kink of 90° is J(100 bp, 10 nm,90°) « 10"^ M, and 
the corresponding looping free energy AG(100 bp, 10 nm, 90°) w 2 k-^T. A kink also changes the most probable loop 
length from 500 bp (no kink) to about 175 bp for a kink of 120°, going to the rigid kinked rod-like configuration for 
smaller k, values. This is an interesting mechanism because the loop lengths implied in in vivo DNA processing by pro- 
teins spread on a large lengths range. Our results were compared to previous analytical approximations (in particular 
the results of the gaussian model, the Wilhelm and Frey expansion 14] and the Shimada and Yamakawa for- 
mula ^3) and numerical calculations (in particular the Monte Carlo simulations data obtained by Podtelezhnikov 
and VOLOGODSKii 0, and the brownian dynamics simulations data obtained by Langowski and al. C3,EEE3) 
as well as experimental results |^ ^ |^ {e-Q-, the ones obtained by LiA and al. on the looping dynamics mediated 
by the Gal and HU proteins). Finally a simple formula (|25|l including both the protein bribge and kink effects has 
been proposed. This formula has the advantage of not containing adjustable parameters with respect to the existing 
formulas that include both these effects 19]. 

Still many effects omitted in this work can be included without significant changes in the numerical algorithm. The kink 
we considered is actually permanent (that is not thermally excited), site specific (at half-length) and rigid (k fixed). 
Although this rigidity seems relevant to most protein bindings to DNA at first glance [sO] , it was pointed out in Yan 
and al. works l20l [2 a] . kinks may also be semi-flexible (exhibiting higher or lower rigidities than the bare DNA) or 
even fully flexible |2lLl31| . For instance, the HU/DNA complex was recently observed to be very flexible under specific 
experimental conditio ns |32l. Flexible hinges were also stated to occur along the DNA due to the opening of small 
denaturation bubbl es l 2ll l33l. such as the one needed by HU to fit in the double helix Such flexibility could be 
taken into account [30] in our model. This kind of defects could also be thermally activated, occuring at multiple 
non-specific sites along the DNA Isol l3ll Is^ ] . Both effects may be included in our model. Note that using 

effective persistence lengths could turn out convenient, despite these inform little about the kink properties (number, 
location, rigidity, etc). Actually this would be equivalent to study DNA stiffness loss due to sequence effects by 
cutting WLC in different stiff fragments, depending on the CG or AT content of the whole sequence to cyclize. The 
same approach may allow an approximative study of the DNA polyelectrolyte nature too 36] . Otherwise electrostatic 
potential has to be included in WLC energy. Twist elasticity leads to slight modifications of the quantum analog we 
used although requiring some care |37j . This is expected to play an important role in looping, especially when specific 
alignment of the loop extremities are required. Finally cyclization dynamics could be modeled using a simple two 
states model where DNA is "closed" or "opened", that is cyclized or not. Such study relies on the (statics) cycfization 
factor we computed in this article [38j ] . Direct comparison to experimental lifetimes measures would be possible 0, |^ . 
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